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Abstract. In case-based reasoning, the adaptation step depends in 
general on domain-dependent knowledge, which motivates studies 
on adaptation knowledge acquisition (AKA). CABAMAKA is an AKA 
system based on principles of knowledge discovery from databases. 
This system explores the variations within the case base to elicit 
adaptation knowledge. It has been successfully tested in an appli- 
cation of case-based decision support to breast cancer treatment. 

1 INTRODUCTION 

Case-based reasoning (CBR |4|) aims at solving a target problem 
thanks to a case base. A case represents a previously solved prob- 
lem. A CBR system selects a case from the case base and then adapts 
the associated solution, requiring domain-dependent knowledge for 
adaptation. The goal of adaptation knowledge acquisition (AKA) is to 
extract this knowledge. The system CABAMAKA applies principles 
of knowledge discovery from databases (KDD) to AKA. The origi- 
nality of CABAMAKA lies essentially in the approach to AKA that 
uses a powerful learning technique that is guided by a domain ex- 
pert, according to the spirit of KDD. This paper proposes an original 
and working approach to AKA, based on KDD techniques. 

, CBR and adaptation. A case in a given CBR application is usually 
represented by a pair (pb, Sol(pb)) where pb represents a problem 
statement and Sol(pb), a solution of pb. CBR relies on the source 

. cases (srce, Sol(srce)) that constitute the case base CB. In a par- 
ticular CBR session, the problem to be solved is called target prob- 
lem, denoted by tgt. A case-based inference associates to tgt a so- 
lution Sol(tgt), with respect to the case base CB and to additional 
knowledge bases, in particular O, the domain ontology that usually 
introduces the concepts and terms used to represent the cases. 

A classical decomposition of CBR consists in the steps of retrieval 
and adaptation. Retrieval selects (srce, Sol(srce)) 6 CB such that 

. srce is judged to be similar to tgt . The goal of adaptation is to solve 
tgt by modifying Sol(srce) accordingly. 

The work presented hereafter is based on the following model of 
adaptation, similar to transformational analogy Q: 

© (srce, tgt) i — ► Apb, where Apb encodes the similarities and 

dissimilarities of the problems srce and tgt. 
© (Apb, AK) i — > Asol, where AK is the adaptation knowledge 

and where Asol encodes the similarities and dissimilarities of 

Sol(srce) and the forthcoming Sol(tgt). 
© (Sol(srce), Asol) i— > Sol(tgt), Sol(srce) is modified into 

Sol (tgt) according to Asol. 

Adaptation is generally supposed to be domain-dependent in the 
sense that it relies on domain-specific adaptation knowledge. There- 
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fore, this knowledge has to be acquired. This is the purpose of adap- 
tation knowledge acquisition (AKA). 

A related work in AKA. The idea of the research pre- 
sented in 1 3 1 is to exploit the variations between source cases to 
learn adaptation rules. These rules compute variations on solu- 
tions from variations on problems. More precisely, ordered pairs 
(srce-casei, srce-case2) of similar source cases are formed. 
Then, for each of these pairs, the variations between the problems 
srcei and srce2 and the solutions Sol(srcei) and Sol(srce2) 
are represented (Apb and Asol). Finally, the adaptation rules 
are learned, using as training set the set of the input-output pairs 
(Apb, Asol). The experiments have shown that the CBR system us- 
ing the adaptation knowledge acquired from the automatic system 
of AKA shows a better performance compared to the CBR system 
working without adaptation. This research has strongly influenced 
our work that is globally based on similar ideas. 

2 CABAMAKA 

Principles. CABAMAKA deals with case base mining for AKA . 
Although the main ideas underlying CABAMAKA are shared with 
those presented in |3|, the followings are original ones. The adap- 
tation knowledge that is mined has to be validated by experts and 
has to be associated with explanations that make it understand- 
able by the user. In this way, CABAMAKA may be considered as 
a semi-automated (or interactive) learning system. Another differ- 
ence with 1 3 1 lies in the volume of the cases that are examined: 
given a case base CB where |CB| = n, the CABAMAKA system 
takes into account every ordered pair (srce-casei, srce-case2) 
with srce-casei 7^ srce-case2 (whereas in [3], only the pairs of 
similar source cases are considered, according to a fixed criterion). 
Thus, the CABAMAKA system has to cope with n(n — 1) pairs, a 
rather large number of elements, since in our application n ~ 750. 
(n(n — 1) — 5 • 10°). This is why efficient techniques of knowl- 
edge discovery from databases (KDD [2]) have been chosen for this 
system. 

Principles of KDD. The goal of KDD is to discover knowledge 
from databases, with the supervision of an analyst (expert of the do- 
main). A KDD session usually relies on three main steps: data prepa- 
ration, data-mining and interpretation. 

Data preparation is based on formatting and filtering operations. 
The formatting operations transform the data into a form allowing 
the application of the chosen data-mining operations. The filtering 
operations are used for removing noisy data and for focusing the 
data-mining operation on special subsets of objects and/or attributes. 

Data-mining methods are applied to extract pieces of informa- 
tion from the data. These pieces of information have some regular 



properties allowing their extraction. For example, CHARM |5| is a 
data-mining algorithm that performs efficiently the extraction of fre- 
quent closed itemsets (FCIs). CHARM inputs a database in the form 
of a set of transactions, each transaction T being a set of boolean 
properties or items. An itemset I is a set of items. The support of 
J, support (7), is the proportion of transactions T of the database 
possessing I (I C T). I is frequent, with respect to a threshold 
o G [0; 1], whenever support (J) > a. J is closed if it has no proper 
superset J (/ C J) with the same support. 

Interpretation aims at interpretating the output of data-mining i.e. 
the FCIs in the present case, with the help of an analyst. In this way, 
the interpretation step produces new knowledge units (e.g. rules). 

Formatting. The formatting step of CABAMAKA inputs the 
case base CB and outputs a set of transactions obtained from the 
pairs (srce-casei, srce-case2). It is composed of two substeps. 
During the first substep, each srce-case = (srce, Sol(srce)) G 
CB is formatted in two sets of boolean properties: $(srce) and 
<!>(Sol(srce)). The computation of <I>(srce) consists in translating 
srce from the problem representation formalism to 2 V , V being a 
set of boolean properties. Possibly, some information may be lost 
during this translation, but this loss has to be minimized. Now, this 
translation formats an expression srce expressed in the framework 
of the domain ontology O to an expression <f>(srce) that will be ma- 
nipulated as data, i.e. without the use of a reasoning process. There- 
fore, in order to minimize the translation loss, it is assumed that if 
p G <J>(srce) and p entails q (given O) then q G <E>(srce). In other 
words, <£>(srce) is assumed to be deductively closed given O in the 
set V . The same assumption is made for <f>(Sol(srce)). How this 
first substep of formatting is computed in practice depends heavily 
on the representation formalism of the cases. 

The second substep of formatting produces a transaction T = 
$((srce-casei, srce-case2)) for each ordered pair of distinct 
source cases, based on the sets of items 5>(srcei), $(srce2), 
$(Sol(srcei)) and $(Sol(srce2)). Following the model of adap- 
tation presented in introduction (items[®|[®|and[@}, T has to encode 
the properties of Apb and Asol. Apb encodes the similarities and 
dissimilarities of srcei and srce2, i.e.: 

• The properties common to srcei and srce2 (marked by "="), 

• The properties of srcei that srce2 does not share ("-") and 

• The properties of srce2 that srcei does not share ("+")• 

All these properties are related to problems and thus are marked by 
pb. Asol is computed in a similar way and $(T) = Apb U Asol. 
For example, 

|<l>(srcei) = {a, b, c} $(Sol(srcei)) = {A, B} 
1 |$(srce 2 ) = {b,c,d} $(Sol(srce 2 )) = {B, C} 
then T — {a pb , b pb , c^ b , d pb , A sol , B 30l , C sol } (1) 

Mining. The extraction of FCIs is computed thanks to 
CHARM (in fact, thanks to a tool based on a CHARM-like 
algorithm) from the set of transactions. A transaction T = 
$((srce-casei, srce-case2)) encodes a specific adaptation 
((srcei, Sol(srcei)), srce2) i— > Sol(srce2). An FCI extracted 
may be considered as a generalization of a set of transactions. For 
example, if I ex = {a~ b , c pb , d pb , A~ 30l , B~ 0l , C* ol ) is an FCI, I ex is 
a generalization of a subset of the transactions including the transac- 
tion T of equation Q: I ex C T. The interpretation of this FCI as an 
adaptation rule is explained below. 



Interpretation. The interpretation step is supervised by the ana- 
lyst. The C ABAMAKA system provides the analyst with the extracted 
FCIs and facilities for navigating among them. The analyst may se- 
lect an FCI, say /, and interpret / as an adaptation rule. For example, 
the FCI lex may be interpreted in the following terms: 

if a is a property of srce but is not a property of tgt, 
c is a property of both srce and tgt, 
d is not a property of srce but is a property of tgt, 
A and B are properties ofSol(srce) and 
C is not a property of Sol(srce) 
then the properties of Sol(tgt) are 

$(Sol(tgt)) = ($(Sol(srce)) \ {A}) U {C}. 

This has to be translated as an adaptation rule r of the CBR system. 
Then the analyst corrects r and associates an explanation with it. 

Implementation. The application domain of the CBR system we 
are developing is breast cancer treatment: in this application, a prob- 
lem pb describes a class of patients with a set of attributes and asso- 
ciated constraints (holding on the age of the patient, the size and the 
localization of the tumor, etc.). A solution Sol(pb) of pb is a set of 
therapeutic decisions (in surgery, chemotherapy, etc.). The requested 
behavior of the CBR system is to provide a treatment and explana- 
tions on this treatment proposal. This is why the analyst is required 
to associate an explanation to a discovered adaptation rule. 

The problems, solutions and the domain ontology of the applica- 
tion are represented in OWL DL (recommendation of the W3C). 

3 CONCLUSION 

The CABAMAKA system presented in this paper is inspired by the 
research presented in 1 3 1 and by the principles of KDD for the pur- 
pose of semi-automatic adaptation knowledge discovery. It has en- 
abled to discover several useful adaptation rules for a medical CBR 
application. It has been designed to be reusable for other CBR appli- 
cations: only a few modules of CABAMAKA are dependent on the 
formalism of the cases and of the domain ontology, and this formal- 
ism, OWL DL, is a well-known standard. One element of future work 
consists in searching for ways of simplifying the presentation of the 
numerous extracted FCIs to the analyst. This involves an organiza- 
tion of these FCIs for the purpose of navigation among them. Such 
an organization can be a hierarchy of FCIs according to their speci- 
ficities or a clustering of the FCIs in themes. 
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